T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

Huang, Kaiyi; Duan, Chengqi; Sun, Kaiyue; Xie, Enze; Li, Zhenguo; Liu, Xihui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.06350 (cs)

[Submitted on 12 Jul 2023 (v1), last revised 8 Mar 2025 (this version, v3)]

Title:T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

Authors:Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu

View PDF HTML (experimental)

Abstract:Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-${\alpha}$, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs.

Comments:	This is the journal version. For conference version (T2I-CompBench): arXiv:2307.06350v2. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.06350 [cs.CV]
	(or arXiv:2307.06350v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.06350

Submission history

From: Kaiyi Huang [view email]
[v1] Wed, 12 Jul 2023 17:59:42 UTC (13,191 KB)
[v2] Mon, 30 Oct 2023 11:42:42 UTC (13,449 KB)
[v3] Sat, 8 Mar 2025 14:57:45 UTC (17,859 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators